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Abstract. We define and study the link prediction problem in bipartite 
networks, specializing general link prediction algorithms to the bipartite 
case. In a graph, a link prediction function of two vertices denotes the 
similarity or proximity of the vertices. Common link prediction func- 
tions for general graphs are defined using paths of length two between 
two nodes. Since in a bipartite graph adjacency vertices can only be con- 
nected by paths of odd lengths, these functions do not apply to bipartite 
graphs. Instead, a certain class of graph kernels (spectral transforma- 
tion kernels) can be generalized to bipartite graphs when the positive- 
semidefinite kernel constraint is relaxed. This generalization is realized 
by the odd component of the underlying spectral transformation. This 
construction leads to several new link prediction pseudokernels such as 
the matrix hyperbolic sine, which we examine for rating graphs, author- 
ship graphs, folksonomies, document-feature networks and other types 
of bipartite networks. 



1 Introduction 

In networks where edges appear over time, the problem of predicting such edges 
is called link prediction [1,2]. Common approaches to link prediction can be de- 
scribed as local when only the immediate neighborhood of vertices is considered 
and latent when a latent model of the network is used. An example for local 
link prediction methods is the triangle closing model, and these models are con- 
ceptually very simple. Latent link prediction methods are instead derived using 
algebraic graph theory: The network's adjacency matrix is decomposed and a 
transformation is applied to the network's spectrum. This approach is predicted 
by several graph growth models and results in graph kernels, positive-semidefinite 
functions of the adjacency matrix [3]. 

Many networks contain edges between two types of entities, for instance item 
rating graphs, authorship graphs and document-feature networks. These graphs 
are called bipartite [4], and while they are a special case of general graphs, link 
prediction methods cannot be generalized to them. As we show in Section 2, 
this is the case for all link prediction functions based on the triangle closing 
model, as well as all positive-semidefinite graph kernels. Instead, we will see that 



their odd components can be used, in Section 3. For each positive-semidefinite 
graph kernel, we derive the corresponding odd pseudokernel. One example is the 
exponential graph kernel exp(A). Its odd component is sinh(A), the hyperbolic 
sine. We also introduce the bipartite von Neumann pseudokernel, and study the 
bipartite versions of polynomials with only odd powers. We show experimentally 
(in Section 4) how these odd pseudokernels perform on the task of link prediction 
in bipartite networks in comparison to their positive counterparts, and give an 
overview of their relative performances . We also sketch their usage for detecting 
near-bipartite graphs. 

2 Bipartite Link Prediction 

The link prediction problem is usually defined on unipartite graphs, where com- 
mon link prediction algorithms make several assumptions [5]: 

— Triangle dosing: New edges tend to form triangles. 

— Clustering: Nodes tend to form well-connected clusters in the graph. 

In bipartite graphs these assumptions are not true, since triangles and larger 
cliques cannot appear. Other assumptions have therefore to be used. While a 
unipartite link prediction algorithm technically applies to bipartite graphs, it 
will not perform well. Methods based on common neighbors of two vertices will 
for instance not be able to predict anything in bipartite graphs, since two ver- 
tices that would be connected (from different clusters) do not have any common 
neighbors. 

Several important classes of networks are bipartite: authorship networks, in- 
teraction networks, usage logs, ontologies and many more. Many unipartite net- 
works (such as coauthorship networks) can be reinterpreted as bipartite networks 
when edges or cliques are modeled as vertices. In these cases, special bipartite 
link prediction algorithms are necessary. The following two sections will review 
local and algebraic link prediction methods for bipartite graphs. Examples of 
specific networks of these types will be given in Section 4. 

Definitions Given an undirected graph G = (V, E) with vertex set V and edge 
set i?, its adjacency matrix A G K^^^ is defined as Auv = 1 when {u,v) e E 
and Auv = otherwise. For a bipartite graph G = {V + W,E), the adjacency 
matrix can be written as A = [O B; O] , where B £ MX^^ is the biadjacency 
matrix of G. 

2.1 Local Link Prediction Functions 

Some link prediction functions only depend on the immediate neighborhood of 
two nodes; we will call these functions local link prediction functions [1]. 

Let u and v be two nodes in the graph for which a link prediction score is to be 
computed. Local link prediction functions depend on the common neighbors of u 
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(a) Unipartite network (b) Bipartite network 

Fig. 1. Link prediction by spreading activation in unipartite and bipartite networks. 
In the unipartite case, all paths are used. In the bipartite case, only paths of odd 
length need to be considered. In both cases, the weight of paths is weighted in inverse 
proportion to path length. 

and V. In the bipartite link prediction problem, u and v are in different clusters, 
and thus have no common neighbors. The following link prediction functions 
are therefore not applicable to bipartite graphs: The number of common neigh- 
bors [1], the measure of Adamic and Adar [6] and the Jaccard coefficient [1]. 
These methods are all based on the triangle closing model, which is not valid 
for bipartite graphs. 

Preferential Attachment Taking only the degree of u and v into account for 
link prediction leads to the preferential attachment model [7] , which can be used 
as a model for more complex methods such as modularity kernels [8, 9]. 

If d{u) is the number of neighbors of node u, the preferential attachment mod- 
els gives a prediction between u and v of d{u)d{v)/ {2\E\). The factor 1/(2|£'|) 
normalizes the sum of predictions for a vertex to its degree. 

3 Algebraic Link Prediction Functions 

Link prediction algorithms that not only take into account the immediate neigh- 
borhood of two nodes but the complete graph can be formulated using algebraic 
graph theory, whereby a decomposition of the graph's adjacency matrix is com- 
puted [10]. By considering transformations of a graph's adjacency matrix, link 
prediction methods can be defined and learned. Algebraic link prediction meth- 
ods are motivated by their scalability and their learnability. They are scalable 
because they rely on a model that is built once and which makes computation 
of recommendations fast. These models correspond to decomposed matrices and 
can usually be updated using iterative algorithms [11]. In contrast, local link pre- 
diction algorithms are memory-based^ meaning they access the adjacency data 
directly during link prediction. Algebraic link prediction methods are learnable 
because their parameters can be learned in a unified way [12]. 




In this section, we describe how algebraic link prediction methods apply 
to bipartite networks. Let G = {V, E) be a (not necessarily bipartite) graph. 
Algebraic link prediction algorithms are based on the eigenvalue decomposition 
of its adjacency matrix A: 



where F{A) applies a real function /(A) to each eigenvalue Aj. F{A) then contains 
link prediction scores that, for each node, give a ranking of all other nodes, which 
is then used for link prediction. If /(Aj) is positive, is a graph kernel, otherwise, 
we will call F a pseudokernel. 

Several spectral transformations can be written as polynomials of the adja- 
cency matrix in the following way. The matrix power A^ gives, for each vertex 
pair (u, v), the number of paths of length i between u and v. Therefore, a polyno- 
mial of A gives, for a pair (m, v), the sum of all paths between u and u, weighted 
by the polynomial coefficients. This fact can be exploited to find link prediction 
functions that fulfill the two following requirements: 

— The link prediction score should be higher when two nodes are connected by 
many paths. 

— The link prediction score should be higher when paths are short. 

These requirements suggest the use of polynomials / with decreasing coefficients. 

3.1 Odd Pseudokernels 

In bipartite networks, only paths of odd length are significant, since an edge 
can only appear between two vertices if they arc already connected by paths of 
odd lengths. Therefore, only odd powers are relevant, and we can restrict the 
spectral transformation to odd polynomials, i.e. polynomials with odd powers. 

The resulting spectral transformation is then an odd function and except 
in the trivial and undesired case of a constant zero function, will be negative 
at some point. Therefore, all spectral transformations described below are only 
pseudokernels and not kernels. 

The Hyperbolic Sine In unipartite networks, a basic link prediction function 
is given by the matrix exponential of the adjacency matrix [13-15]. The matrix 
exponential can be derived by considering the sum 



A = UAU^ 



To predict links, a spectral transformation is usually applied: 



F{A) = UF{A)U'^ 
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Fig. 2. In this curve fitting plot of tfie Slovak Wikipedia, the hyperbolic sine is a good 
match, indicating that the hyperbolic sine pseudokernel performs well. 



where coefRcients are decreasing with path length. Keeping only the odd com- 
ponent, we arrive at the matrix hyperbolic sine [16]. 

sinh(aA) = y -A'+'' 

Figure 2 shows the hyperbolic sine applied to the (positive) spectrum of the 
bipartite Slovak Wikipedia user-article edit network. 

The Odd von Neumann Pseudokernel The von Neumann kernel for uni- 
partite graphs is given by the following expression [13]. 

oo 

ifNEu(A) = (/ - aA)-i = 

i=0 

We call its odd component the odd von Neumann pseudokernel: 



i=0 



The hyperbolic sine and von Neumann pseudokernels are compared in Fig- 
ure 3, based on the path weights they produce. 




Fig. 3. Comparison of several odd pseudokernels: the hyperbolic sine and the odd von 
Neumann pseudokernel. The relative path weight is proportional to the corresponding 
coefficient in the Taylor series expansion of the spectral transformation. 



Rank Reduction Similarly, rank reduction of the matrix A can be described 
as a pseudokernel. Let be the eigenvalue with k-th largest absolute value, 
then rank reduction is defined by 




A if |A| > |A,| 
otherwise 



This function is odd, but docs not have an (odd) Taylor series expansion. 



3.2 Computing Latent Graph Models 

Bipartite graphs have adjacency matrices of the form 



A 



where B is the biadjacency matrix of the graph. This form can be exploited 
to reduce the eigenvalue decomposition of A to the equivalent singular value 
decomposition B = tjSV. 

with U = C//\/2, V = and each singular value a corresponds to the 

eigenvalue pair {±(t}. 



3.3 Learning Pseudokernels 

The hyperbolic sine and the von Neumann pseudokernel are parametrized by a, 
and rank reduction has the parameter k, or equivalently Afc. These parame- 
ters can be learned by reducing the spectral transformation problem to a one- 
dimensional curve fitting problem, as described in [12]. In the bipartite case, we 
can apply the curve fitting method to only the graph's singular value, since odd 
spectral transformations fit the negative eigenvalue in a similar way they fit the 
positive eigenvalues. This kernel learning method is shown in Figure 4. 



(a) MovieLens lOM 



(b) English Wikipcdia 



Fig. 4. Learning a pseudokernel that matches an observed spectral transformation in 
the MovieLens lOM rating network and English Wikipedia edit history. 

4 Experiments 

As experiments, we show the performance of bipartite link prediction functions 
on several large datasets, and present a simple method for detecting bipartite or 
near-bipartite datasets. 

4.1 Performance on Large Bipartite Networks 

We evaluate all bipartite link prediction functions on the following bipartite 
network datasets. BibSonomy is a folksonomy of scientific publications [17]. 
BookCrossing is a bipartite user-book interaction network [18]. CiteULike is 
a network of tagged scientific papers [19]. DBpedia is the semantic network of 
relations extracted from Wikipedia, of which we study the five largest bipar- 
tite relations [20]. Epinions is the rating network from the product review site 
Epinions.com [21]. Jester is a user-joke network [22]. MovieLens is a user-movie 
rating dataset, and a folksonomy of tags attached to these movies [23]. Netflix 
is the large user-item rating network associated with the Netflix Prize [24] . The 
Wikipedia edit graphs are the bipartite user-article graphs of edits on various 
language Wikipedias. The Wikipedia categories are represented by the bipar- 
tite article-category network [25]. All datasets are bipartite and unweighted. In 
rating datasets, we only consider the presence of a rating, not the rating itself. 
Table 1 gives the number of nodes and edges in each dataset. 

In the experiments, we withhold 30% of each network's edges as the test set 
to predict. For datasets in which edges are labeled by timestamps, the test set 
consists of the newest edges. The remaining training set is used to compute link 
prediction scores using the preferential attachment model and the pseudokernel 
learning methods described in the previous sections. For the pseudokernel learn- 
ing methods, the training set is again split into 70% / 30% subsets for training. 
Link prediction accuracy is measured by the mean average precision (MAP), 
averaged over all users present in the test set [26]. The evaluation results are 
summarized in Table 1. 



Table 1. Overview of datasets and experiment results. See the text for a descrip- 
tion of the datasets and Hnk prediction methods. Link prediction methods: Poly: odd 
polynomials, NN-poly: odd nonnegative polynomials, Sinh: hyperbolic sine. Red: rank 
reduction, Odd Neu: odd von Neumann pseudokernel, Pref: preferential attachment. 
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4.2 Detecting Near-bipartite Networks 

Some networks are not bipartite, but nearly so. An example would be a net- 
work of "fan" relationships between persons where there are clear "hubs" and 
"authorities" , i.e. popular persons and persons being fan of many people. While 
these networks are not strictly bipartite, they are mostly bipartite in a sense 
that has to be made precise. Measures for the level of bipartivity exist in sev- 
eral forms [4,27], and spectral transformations offer another method. Using the 
link prediction method described in Section 3.3, nearly bipartite graphs can be 
recognized by the odd shape of the learned curve fitting function. 

Figure 5 shows the method applied to two unipartite networks: the Advogato 
trust network [28] and the hyperlink network in the English Wikipedia [25]. 
The curves indicate that the Advogato trust network is not bipartite, while the 
Wikipedia link network is nearly so. 

5 Discussion 

While technically the link prediction problem in bipartite graphs is a subproblem 
of the general link prediction problem, the special structure of bipartite graphs 
makes common link prediction algorithms ineffective. In particular, all methods 
based on the triangle closing model cannot work in the bipartite case. Out of 



(a) Advogato trust network (b) English Wikipedia hyperlinks 



Fig. 5. Detecting near- bipartite and non-bipartite networks: If the hyperbolic sine fits, 
the network is nearly bipartite; if the exponential fits, the network is not nearly bi- 
partite, (a) the Advogato trust network, (b) the English Wikipedia hyperlink network. 
These graphs show the learned transformation of a graph's eigenvalues; see the text 
for a detailed description. 

the simple local link prediction methods, only the preferential attachment model 
can be used in bipartite networks. 

Algebraic link prediction methods can be used instead, by restricting spectral 
transformations to odd functions, leading to the matrix hyperbolic sine as a link 
prediction function, and an odd variant of the von Neumann kernel. As in the 
unipartite case, no single link prediction method is best for all datasets. 
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